SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching.
نویسندگان
چکیده
With the advent of Next-Generation (NG) sequencing, it has become possible to sequence a entire genomes quickly and inexpensively. However, in some experiments one only needs to extract and assembly a portion of the sequence reads, for example when performing transcriptome studies, sequencing mitochondrial genomes, or characterizing exomes. With the raw DNA-library of a complete genome it would appear to be a trivial problem to identify reads of interest. But it is not always easy to incorporate well-known tools such as BLAST, BLAT, Bowtie, and SOAP directly into a bioinformatics pipelines before the assembly stage, either due to incompatibility with the assembler's file inputs, or because it is desirable to incorporate information that must be extracted separately. For example, in order to incorporate flowgrams from a Roche 454 sequencer into the Newbler assembler it is necessary to first extract them from the original SFF files. We present SlopMap, a bioinformatics software utility that allows quickly identification similar to the provided reference reads from either Roche 454 or Illumnia DNA library. With simple and intuitive command-line interface along with file output formats compatible to assembly programs, SlopMap can be directly embedded to biological data processing pipeline without any additional programming work. In addition, SlopMap preserves flowgram information needed for Roche 454 assembler.
منابع مشابه
Exact Mixed Integer Programming for Integrated Scheduling and Process Planning in Flexible Environment
This paper presented a mixed integer programming for integrated scheduling and process planning. The presented process plan included some orders with precedence relations similar to Multiple Traveling Salesman Problem (MTSP), which was categorized as an NP-hard problem. These types of problems are also called advanced planning because of simultaneously determining the appropriate sequence and m...
متن کاملApplication of Support Vector Machine Regression for Predicting Critical Responses of Flexible Pavements
This paper aims to assess the application of Support Vector Machine (SVM) regression in order to analysis flexible pavements. To this end, 10000 Four-layer flexible pavement sections consisted of asphalt concrete layer, granular base layer, granular subbase layer, and subgrade soil were analyzed under the effect of standard axle loading using multi-layered elastic theory and pavement critical r...
متن کاملQuicK-mer: A rapid paralog sensitive CNV detection pipeline
QuicK-mer is a unified pipeline for estimating genome copy-number from high-throughput Illumina sequencing data. QuicK-mer utilizes the Jellyfish application to efficiently tabulate counts of predefined sets of k-mers. The program performs GC-normalization using defined control regions and reports paralog-specific estimates of copy-number suitable for downstream analysis. The package is freely ...
متن کاملApplication of Artificial Neural Networks for Analysis of Flexible Pavements under Static Loading of Standard Axle
In this study, an artificial neural network was developed in order to analyze flexible pavement structure and determine its critical responses under the influence of standard axle loading. In doing so, more than 10000 four-layered flexible pavement sections composed of asphalt concrete layer, base layer, subbase layer, and subgrade soil were analyzed under the impact of standard axle loading. P...
متن کاملNeptune: A Tool for Rapid Microbial Genomic Signature Discovery
Neptune locates genomic signatures using an exact k -mer matching strategy while accommodating k -mer mismatches. The software identifies sequences that are sufficiently represented within “inclusion targets” and sufficiently absent from “exclusion targets”. The signature discovery process is accomplished using probabilistic models instead of heuristic strategies. We have evaluated Neptune on L...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of data mining in genomics & proteomics
دوره 4 3 شماره
صفحات -
تاریخ انتشار 2013